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AfiSTKACT 


This  report  examines  maximum  likelihood  parameter  estimation  for  signal 
models  characteristic  of  the  stepped  sinusoid  response  of  underwater 
acoustic  transducers.  This  includes  a  review  of  the  principal  component 
linear  prediction  mthod,  an  exposition  of  the  variable  projection 
nonlinear  least-squares  method,  a  review  of  linear  least-squares  theory 
with  special  emphasis  on  generalized  inverses  and  projection  operators, 
and  a  discussion  of  iterative  techniques  for  nonlinear  least  squares 
algorithms.  The  estimation  problem  is  found  to  to  be  particularly 
difficult  when  the  stepped  sinusoid  excitation  is  at  or  near  a  resonance 
and  the  observation  time  is  short  compared  to  the  model  transient. 
Characteristic  least-squares  error  surfaces  and  contours  are  obtained  for 
a  two  pole  high-pass  transducer  model.  A  variable  projection 
implementation  of  a  maximum  likelihood  estimator  is  used  to  study 
parameter  estimation  when  the  excitation  is  near  resonance. 


EBYWOEDS  -  signal  modeling,  maximum  likelihood  parameter  estimation, 
nonlinear  least-squ2tres,  variable  projection  functional, 
orthogonal  projection  operator,  generalized  inverse,  minimum 
norm  solution,  Factorization,  Gauss-Newton  method. 
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1.  INTRODUCTION 

Calibration  of  an  underwater  acoustic  transducers  entails,  in  paxt, 
estimating  the  steady-state  response  of  the  transducer  to  stepped 
sinusoid  excitations.  Of  particular  interest  is  the  steady-state 
amplitude,  which  is  used  to  characterize  the  transducer  radiation 
pattern.  An  inherent  problem  arises,  though,  in  making  the  response 
measurements,  since  reflections  from  measurement  volume  boundaries  can 
corrupt  the  sign'll  limiting  the  length  of  available  data.  The  desired 
signal  data  is  thus  confined  to  a  finite  observation  window  occurring 
between  the  arrival  of  the  wave  at  the  hydrophone  via  the  direct  path 
from  the  projector  and  via  the  reflected  paths  (see  Figure  1) . 

The  problems  caused  by  these  reflections  become  critical  at  low 
frequencies  (particularly  for  complex  high  power  devices)  because  the 
decaying  component  of  the  transducer’s  transient  response  may  not  settle 
to  a  negligible  level  at  any  time  during  the  available  observation 
window,  thus  making  direct  measurement  of  the  steady-state  amplitude  and 
phase  impossible.  This,  therefore,  necessitates  estimating  the  steady- 
state  information  from  the  transient  portion  of  the  response. 

In  research  previously  carried  out  at  the  Naval  Research  Laboratory, 
Underwater  Sound  Reference  Detachment,  and  at  the  Center  for 
Communications  and  Signal  Processing,  University  of  South  Florida  [1] ,  a 
signal  parameter  estimation  algorithm  utilizing  principal  component 
linear  prediction  as  described  by  Kumerasean  and  Tufts  [2] ,  was  used  for 
estimating  these  parameters.  This  method  was  found  to  yield  acceptable 
results  (as  assessed  by  the  Cramer-Rao  bounds  for  unbiased  estimates)  for 
excitation  frequencies  away  from  the  resonant  frequency  of  the 
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transducer.  For  excitation  frequencies  near  the  resonance,  hovever,  the 
mean  square  errors  of  the  estimates  were  unnacceptably  larger  than  the 
Cramer-Rao  bound,  thus  suggesting  the  need  for  a  maximum  likelihood 
estimator.  This  report  addresses  this  problem. 

The  maximum  likelihood  algorithm  presented  is  based  upon  the  variable 
projection  nonlinear  least-squaires  method  described  by  Golub  and  Pereyra 
[3] .  This  method  essentially  reduces  the  number  of  parameters  which  must 
be  optimized  iteratively  by  defining  a  new  cost  functional,  the  variable 
projection  functional,  which  is  a  function  only  of  the  observation  vector 
and  those  parameters  which  occur  nonlinear ly  in  the  signal  model.  For 
example,  the  two-pole  high-pass  model  of  a  transducer  used  in  our 
simulations  results  in  a  variable  projection  functional  which  can  be 
mapped  solely  in  terms  of  the  damping  factor  and  frequency  of  the 
decaying  component  of  the  transient.  A  contour  plot  of  this  error 
function  for  a  particular  signal  parameter  model  is  shown  in  Figure  2. 

The  goals  of  this  document  are  to  provide  an  exposition  of  the  theory 
for  obtaining  maximtim  likelihood  parameter  estimators  for  stepped  sine 
response  signal  models  and  to  report  results  of  simulation  studies  of  the 
effectiveness  of  an  ML  algorithm.  This  will  include  (1)  a  reveiw  of 
principal  component  linear  prediction,  (2)  an  exposition  of  the  variable 
projection  nonlinear  least-squares  method,  (3)  a  review  of  linear  least- 
squares  theory  with  special  emphasis  on  the  construction  of  generalized 
inverses  and  projection  operators,  and  (4)  a  discussion  of  iterative 
techniques  necessary  for  the  implementation  of  nonlinear  least-squares 
algorithms . 

In  addition  to  presenting  the  results  of  the  computer  simulations 
using  the  algorithm  outlined,  contour  and  surface  plots  of  the  variable 
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projection  functional  are  proyided.  The  parameter  estimation  problem  is 
seen  to  be  particularly  difficult  for  stepped  sinusoid  excitations  near  a 
resonance.  Maximtim  likelihood  performance  is  achieved  by  the  computer 
implementation  of  the  variable  projection  algorithm  described  herein  down 
to  a  threshold  signal-to-noise  ratio  which  depends  on  the  quality  factor 
(or  Q)  of  the  transducer  model. 


Figure  1:  Transducer  Calibration  Environment 
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2.  BACKGROUND 


This  chapter  provides  a  background  of  a  parameter  estimation 
problem  that  arises  in  acoustic  transducer  calibration.  A  model  will  be 
defined  and  the  parameter  set  which  uniquely  defines  the  signal  will  be 
chosen.  It  will  then  be  shown  that,  if  the  reflection  free  signal  is 
corrupted  with  white  gaussian  noise,  then  nonlinear  least-squares  and  the 
maximum  likelihopd  estimators  are  the  same.  A  review  of  the  principal 
component  linear  prediction  method  will  then  be  provided. 

2.1  Acoustic  Transducer  Response  to  a  Stepped  Sinusoid  Excitation 


A  suitable  model  for  a  transducer’s  response  to  a  stepped  sinusoid 
excitation  is  a  steady-state  sinusoid  (of  the  same  frequency  as  the 
excitation)  and  a  sum  of  damped  sinusoids  (corresponding  to  the  system 
poles)  [4] .  For  real  signals,  this  can  be  written  as 


K 

x(t)  =  Aq  cos(2rfQt+^Q)  +  ^ 


Aj  exp(-Ojt)  cos(2Tf jt+^j) . 


(1) 


where  K  is  the  number  of  real  system  poles  plus  the  number  of  complex 
conjugate  system  pole  pairs.  The  parameters  which  uniquely  define  the 
signal  are 

{  ^0’  ^0’  ^0’  ^1’  ^1’  ^1’  “U  •••’  ^K’  ^K’  “k}* 


Each  sinusoid  in  this  model  may  be  further  decomposed  into  its 
Cartesian  components  so  that  the  signal  poles  are  the  only  parameters 
which  enter  into  the  model  nonlinear ly.  Thus  the  signal  model  becomes 


7 


x(t)  =  Aq  cos(2irfQt)  +  Aq  sin (2x1  Qt) 
C  S 

K 


XI  exp(-a.t)  (  A.  cos(2Tf.t)  +  A.  sin(2Tf.t)  ) 
j=i  J  ^  Jc  ^  J 


(2) 


i.rom  irhich  we  obtain  the  new  parameter  set 


Having  estimated  this  par2U[ieter  set,  we  may  calculate  the  desired 


amplitudes  and  phases  from 


(4) 


2.2  Nonlinear  Least-Squares  Ifaximum  Likelihood  Parameter  Estimation 

The  parameter  estimation  approach  to  system  identification  has  proven 
to  be  a  powerful  tool  in  problems  where  the  system  of  interest  is  known, 
a  priori,  to  have  a  particular  model  structure,  M  [5] .  In  this  case,  the 
model  can  be  parameterized  as  M(8)  using  the  parameter  vector  B  C  D^, 
where  is  an  appropriate  domain.  Thus  the  family  of  models  is 

0*(B)  I  8  C  Djj}  (5) 

and  the  search  for  the  model  which  best  decribes  the  system  becomes  a 
search  for  the  best  parameter  vector,  B.  In  determining  the  best  model 
parameters,  we  invoke  some  penalty  function,  or  cost  functional,  which 
quantifies  in  some  way  the  error  of  estimation. 

Since  the  data  used  to  estimate  the  model  parameters  are  generally 
corrupted  by  additive  noise,  the  observations  are  themselves  random 
variables.  The  Uaximum  Likelihood  Estimator  (ULE)  has  been  shown  [6]  to 
be  the  best  possible  estimator  for  the  model  parameters  given  the  uniform 
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prior  probability  distribution.  In  maximum  likelihood  estimation,  we 
wish  to  find  the  parameter  vector  which  maximizes  the  probability  of  the 
observed  data.  This  is 


-ML  =  Arg  max  p(y^,  yg,--.,  Xj^lS) 

6 


(6) 


where  p(yj^,  •••»  likelihood  function. 

Let  us  assume  the  the  observations  are,  in  fact,  the  sum  of  an  ideal 


signal  x^  and  zero  mean  gaussian  white  noise,  6^.  Thus 

y  =  X  (^  +  e  . 

The  estimation  error  is  then 

e  =  y  -  X  (0) 
n  n  n 

and  the  likelihood  function  for  estimating  B  becomes 


(7) 

(8) 


Now  since  the  logarithm  is  a  monotonic  function,  maximizing  the 
logarithm  of  the  likelihood  function  yields  the  same  result  as  maximizing 
the  likelihood  function  itself.  Thus  we  may  define  the  log  likelihood 
function  as 

L(8)  =  In  {  p(y^,  yg,  yjj|0)  } 

- ln(2T)  -  N  ln((T  )  -  Z  [  y^  -  i  (8)  1^  (10) 

Of  the  terms  in  (10) ,  only  the  last  is  dependent  upon  6  and  this  term 
appears  in  the  expression  with  a  negative  sign.  Therefore,  the  parameter 
vector  which  maximizes  the  log  likelihood  function,  and  thus  the 
likelihood  function,  is  the  one  which  minimizes 
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which  is  simply  the  least-squares  functional.  Thus  we  see  that  for  the 
case  of  an  ideal  signal  in  gaussian  white  noise,  nonlinear  least-squares 
estimation  provides  the  maximum  likelihood  estimator. 

Before  moving  on  to  the  direct  nonlineax  least-squares  approach  to 
the  transducer  signal  parameter  estimation  problem,  an  indirect  approach 
will  be  described. 

2.3  The  Principal  Component  Linear  Prediction  Method 

Linear  prediction  is  a  method  of  difference  equation  modelling  used 
to  estimate  the  poles  of  exponential  signals.  Linear  prediction 
simplifies  a  typically  nonlinear  problem  by  solving  a  related  linear 
problem,  namely  estimating  the  coefficients  in  the  difference  equation. 
From  these  coefficients,  the  exponential  poles  can  be  obtained  by  forming 
and  solving  for  the  roots  of  the  prediction  polynomial  (the  z-plane 
representation  of  the  difference  equation) . 

Least  squaires  line2u:  prediction  dictates  the  solution  of  an 
overdetermined  system  of  equations  to  obtain  the  prediction  coefficients. 
Principal  component  linear  prediction  takes  this  a  step  further  and 
dictates  the  use  of  an  overmodelled  prediction  error  filter,  i.e.  a 
difference  equation  of  order  larger  than  the  expected  signal  order,  and 
the  use  of  a  raink  reduced  approximation  to  the  pseudoinverse  in  the  least 
squares  solution  for  the  coefficients.  Consequently,  principal  component 
linear  prediction  neccessitates  a  selection  process  to  separate  the 
signal  poles  from  the  remaining  estimates. 
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The  signal  parameter  estimation  algorithm  can  be  summarized  in  the 
following  three  steps: 

(1)  solution  for  the  prediction  coefficients  using  the  principal 
component  method, 

(2)  signal  pole  selection,  aind 

(3)  linear  least-squares  solution  of  the  signal  amplitudes. 

Step  1:  Solution  for  Prediction  Coefficients 

It  is  well  known  that  an  q’th  order  discrete  time  linear  system  may 
be  described  by  the  q’th  order  forward  difference  equation 
y(n)  =  aj^y(n-l)  +  a2y(n-2)  +  •••  +  a^y(n-q). 

We  may  similarly  describe  the  system  using  the  backward  difference 


equation 

y(n)  =  bj^y(n+l)  +  b2y(n+2)  +  •••  +  b^y(n+q). 

Moving  all  terms  to  the  left  hand  side  and  taking  the  z-transform  of  the 
backward  difference  equation  yields 

Y(z)  [1  -  b^  z  -  bg  -  •••  -  b^  z*^]  =  0.  (12) 

By  determining  the  coefficients  b^,...,b^  smd  equating  the  polynomial  in 
brackets  to  0,  the  system’s  z-plane  poles  can  be  found  as  the  reciprocals 
of  the  backward  prediction  polynomial  roots. 

In  linear  prediction  [11] ,  an  arbitrairy  lineair  system  is  modeled  by  a 
difference  equation  whose  order  (say,  order  L)  is  not  necessarally  equal 
to  the  system  order;  the  prediction  equation  is  said  to  be  overmodelled 
or  undermodelled,  depending  on  if  the  order  L  is  chosen  greater  or  less 


than  the  system  order.  Also,  depending  on  whether  the  coefficients  aure 


determined  for  the  forward  or  backward  difference  equation. 


the  technique 


is  called  forward  or  backward  prediction,  respectively. 
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Given  data  samples  n=0,l, . . . ,N-1,  the  system  of  backward  backward 
predictions  is 


A  b  =  -  h, 

T  ~  ” 

where  b  =  [bj^, . . .  ,bj^]  is  the  unknown  vector 

T 

coefficients,  h  =  [Tqi  •  •  •  A  is 


of  backward  prediction 
the  Hankel  data  matrix 


(13) 


^1 

^2 

... 

^2 

• 

• 

^3 

• 

• 

•  •  •  ^L^l 

•  • 

•  • 

• 

^N-L 

• 

^N-L+l 

•  • 

•  •  •  V 

^N-l 

(14) 


A  primary  difference  which  distinguishes  the  principal  component 
method  of  linear  prediction  from  that  used  in  typical  least  squares 
linear  prediction  is  first  the  use  of  a  prediction  order  L  much  larger 
than  the  known  or  estimated  system  order  (overmodelling) ,  and  then  the 
use  of  a  rank  reduced  approximant  to  the  Hankel  matrix  using  the 
singular  value  decomposition. 

Briefly  stated,  the  singular  value  decomposition  factors  an  arbitrary 
NXH  (usually  N  >  M)  matrix.  A,  as 

A  =  U  E  V^,  (15) 

where  U  is  the  NXN  orthogonal  matrix  of  left  singular  vectors,  Y  is  the 
MXU  orthogonal  matrix  of  right  singular  vectors,  and  E  is  an  NXM  matrix 
whose  only  nonzero  elements  lie  along  the  diagonal  of  the  first  M  rows. 
These  diagonal  elements  of  E,  called  the  singular  values  of  A,  are  non¬ 
negative  and  arranged  in  non-increasing  order.  If  A  is  a  numerically 
rank  deficient  matrix  with  rank  r  <  U,  then  only  the  first  r  singular 
values  will  be  nonzero.  If  the  true  rank  of  the  system  is  r  <  U,  yet 
noise  in  the  data  causes  the  numerical  rank  of  the  matrix  to  equal  M, 
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then  the  first  r  singular  values  are  called  the  principal  singular  values 
and  rank  reduction  can  be  performed  on  the  matrix  A  by  setting  to  zero 
all  but  the  these  principal  singular  values. 

Having  performed  the  singular  value  decomposition  and  rank  reduction 
of  the  Hankel  data  matrix  A,  the  Moore-Penrose  generalized  inverse  of  A 
can  be  defined  as 

A^  =  Y  (16) 

u 

Here,  E^  is  an  MxN  matrix  whose  only  nonzero  elements  are  the  first  r 
elements  along  the  diagonal  of  the  first  H  columns.  These  nonzero 
elements  are  the  reciprocals  of  the  r  principal  singular  values.  Given 
this  pseudoinverse,  the  minimtim  norm  lineau*  least-squares  solution  for 
the  backward  prediction  coefficient  vector  is 

b  =  -  A*  h.  (17) 

Step  2  Signal  Pole  Selection 

With  the  prediction  coefficients  in  hand,  a  pool  of  z-plane  pole 

estimates  can  be  obtained  by  solving  for  and  taking  the  reciprocal  of  the 

roots  of  the  backward  prediction  polynomial 

2  L 

1  +  bj^  z  +  b2  z  +  •••  +  bj^  z  =  0,  (18) 

Due  to  the  overmodeling  used  in  the  principal  component  method,  these 
roots  will  be  far  more  numerous  than  the  actual  signal  poles,  so  that  a 
number  of  these  are  ’extraneous’  roots,  from  which  the  actual  signal 
poles  must  be  distinguished.  In  the  transducer  signal  parameter 
estimation  problem,  these  signal  poles  will  include  a  steady-state  pole 
which  lies  on  the  unit  circle  in  the  z-plane  and  the  system  poles  which 
will  fall  outside  of  the  unit  circle.  Kumerasean  has  observed  [12]  that 
while  these  signal  poles  fall  outside  of  the  unit  circle,  the  extraneous 
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roots  will  fall  within  the  unit  circle.  Thus  the  location  at  which  the 
backward  prediction  polynomial  roots  fall  within  the  z-plane  provides  a 
method  of  pole  selection. 

While  Kumerasean’s  method  of  selection  works  well  at  high  signal-to- 
noise  ratios,  simulations  have  indicated  that  a  signal-to-noise  ratio 
threshold  is  reached,  below  which  this  method  of  pole  selection  cannot 
distinguish  between  the  extraneous  roots  and  the  signal  poles.  Because 
of  this,  a  more  general  method  utilizing  subset  selection  [10]  has  been 
adopted.  This  signal  pole  selection  technique  consists  of  three  parts: 

(1)  reflection  of  roots  into  the  unit  circle  and  transformation 
to  the  s-plane 

(2)  r^eplacement  of  the  excitation  pole,  and 

(3)  subset  selection  of  the  remaining  poles. 

In  part  1,  the  roots  are  selectively  reflected  about  the  unit  circle 
so  that  all  roots  fall  within  the  unit  circle  in  the  z-plane  (so  that  all 
pole  estimates  will  be  stable) .  The  roots  are  then  transformed  to  the  s- 
plane  using  the  following  equations: 


where 

and 


s.  =  -o.  *  i2Tf. 
1  1  1 


=  -  -:p  In  Izj^l 


*  -  JL  4.  -1 
^i  “  2»T 


imag(zj) 
[  real(zj^) 


where  T  is  the  sampling  interval,  s^^  are  the  s-plajie  poles,  and  z^^^  are 
the  z-plane  poles. 

In  pairt  2,  all  roots  are  examined  one  at  a  time  and  the  root  which 


lies  closest  to  the  known  excitation  pole  in  the  s-plane  is  marked  as  the 
steady-state  pole  and  removed  from  the  pool  of  roots.  The  known 
theoretical  pole  value  (aQ=0,2TfQ)  is  then  used  throughout  the  remainder 
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of  the  estimation  process  (in  the  subset  selection  of  the  other  poles  and 
the  amplitude  solution. 

In  part  3,  the  remaining  roots  are  taken  p  at  a  time  (where  p  is  the 
number  of  system  poles),  and  along  with  the  excitation  pole,  used  to  form 
the  basis  function  matrix  for  the  linear  least-squaires  equation  for  the 
amplitudes.  The  observation  vector  is  then  projected  onto  the  column 
space  of  each  of  these  basis  function  matrices  and  the  group  of  p  roots 
which  result  in  the  lowest  residual  sum  of  squares  is  chosen  as  the 
remaining  pole  estimates.  In  this  way,  the  pole  subset  which  best  fits 
the  data  in  the  least-squares  sense  is  determined. 

Step  3;  Amplitude  Solution 

The  final  step  in  the  signal  pairameter  estimation  algorithm  is  the 
linear  least-squares  estimation  of  the  signal  amplitudes.  This  is 
performed  by  constructing  the  basis  function  matrix,  F,  consisting  of 
complex  exponentials  corresponding  to  each  of  the  estimates  poles,  for 
which  the  desired  amplitudes  are  simply  linear  weighting  factors. 

Defining  a  to  be  the  vector  of  unknown  amplitudes 

S  =  [  ^Og'  ^lg»  •••  »  ^Kg  ]^' 

the  problem  is  to  solve  for  a  in  the  linear  least-squairesd  problem 

^  S  -  I* 


(19) 
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3.  YAIIABLB  PROJECTION  NONLINEAR  LEAST-SqUARES  THEORY 

This  chapter  provides  an  exposition  of  the  theory  of  the  variable 
projection  approach  to  solving  nonlinear  least-squares  problems,  as 
developed  by  Golub  and  Pereyra  [3] .  This  approach  partitions  the  unknown 
model  parameters  into  two  groups:  those  that  occiu*  linearly  in  the  model 
and  those  that  occur  nonlinearly  in  the  model.  By  performing  this 
separation  of  variables,  one  is  provided  the  opportunity  to  solve  the 
nonlinear  least-squares  problem  as  a  sequence  of  two  computationally 
simpler  problems. 

Sections  3.1  through  3.4  introduce  the  notation  and  review  vector- 
matrix  calculus  material  necessary  for  the  subsequent  derivations.  The 
remainder  of  the  chapter  derives  three  principal  results:  (1)  the 
relationship  between  the  linesur  and  nonlinear  parameters  within  the 
least-squares  framework  which  allows  the  separation  of  variables,  (2)  aui 
expression  for  the  derivative  of  the  projection  operator,  and  (3)  an 
expression  for  the  gradient  of  the  variable  projection  functional.  With 
these  results,  a  wide  range  of  standard  solution  techniques  are  available 
for  minimizing  the  least-squares  functional  in  the  two-step  procedure 
described. 

3.1  Parameter  Model  Definition 

The  variable  projection  method  is  applicable  to  parameter  estimation 
problems  in  which  the  signal  model  can  be  decomposed  as  a  set  of 
nonlinear  basis  functions  weighted  by  a  set  of  coefficients.  These 
nonlinear  basis  functions  are  a  function  only  of  the  independent  variable 
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(say,  time)  and  a  set  of  parameters  which  we  will  call  collectively  the 
nonlinear  parameter  vector.  Symbolically,  we  may  write  this  as 

U 

x„=i:  a.  f,(£,t  )  ,  (20) 

“  j=l  J  J  “ 

alternatively  written  as 

=  fcg-V  a  (21) 

where 

£  =  [  d^,  ...  ,  (22) 

=  nonlinear  parameter  vector, 
a=  ^a^,  a2,...,  a^j  j  (23) 

=  linear  parameter  vector, 

and 

=  [  li(«.V.  12(2. V . ll((».V  ]  (2<) 

=  basis  function  vector. 

3.2  The  Least-Squares  Functional 


The  parameter  estimation  problem  attempts  to  form  an  estimate,  x^,  of 
an  ideal  signal,  x^,  from  observed  data,  presumably  of  the  ideal  signal 
in  noise.  This  is  written  symbolically  as 

y^  =  x^  +  w^,  n  =  0,1,..., N-1  (25) 

where  w  are  independently  distributed  random  variables  and  x  are  as 
n  n 

described  in  the  previous  section. 

In  forming  this  estimate  of  x^,  we  seek  to  minimize,  by  appropriate 
choice  of  a  and  £,  the  Least-Squares  Functional  (LSF) ,  defined  ais 
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Here,  e(a,d)  is  the  signal  estimation  error  vector.  Since  the 
Euclidean  norm  (2-norm)  ||*||2  Is  used  throughout  this  development,  the 
subscript  will  subsequently  be  dropped  for  convenience. 

Let  us  nov  define  the  independent  variable  vector 

t  =  [  tQ,  tj^,  ...  ,  tjj.j  ]^, 

and  irrite  the  observed  data  in  vector  notation  as 


(30) 


Let  us  also  define  the  basis  f\mction  matrix 


F(£,t) 


• 

• 

= 

fl(£»V  *  *  ’ 

•  •  • 

•  •  • 

• 

•  •  • 

^l(i>^_l)  •  •  • 

(31) 


whose  elements  are  independent  of  the  linear  parameters  and  whose  rows 
each  correspond  to  an  element  of  the  observation  vector. 

We  may  now  define  the  LSF  in  terms  of  the  basis  function  matrix  by 
writing 


=  II  l-K«)  • 


2 


I 


(32) 
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vhere  we  have  dropped  the  explicit  time  dependence  for  convenience. 

is  minimized  when  V^(a,^=0,  where  V  is  the  gradient  operator. 
While  developing  an  expression  for  this  gradient  (of  the  LSF) ,  the  linear 
and  nonlineair  parts  of  the  model  will  be  separated  and  a  new  cost 
functional,  the  variable  projection  functional,  will  be  defined. 

3.3  Differentiation  with  Respect  to  a  Vector 


Consider  the  vector 


25  =  [  *1»  »  *K  ]^' 

Differentiation  of  a  scalar  function  $(x)  with  respect  to  x  yields 


a«(^  a«(^ 

ax  “  axj^  *  axg  * 


a«(x)  IT 


A  particularly  interesting  case  of  this  is  the  quadratic  form, 

f (x)  =  Ax 

where  A  is  a  K  x  K  constant  matrix.  For  this  case,  we  obtain  simply 


a»(x) 


=  2  A  X. 


Now  consider  the  (row)  vector  function  f  (x)  defined  by 

f  (x)  =  [  ]' 

Differentiation  of  f  (x)  with  respect  to  x  yields  the  K  X  M  Jacobian 


matrix 
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ai'^Qc) 

6x 


dx. 


9fl(x) 

axi 

3xi 

afj,(20 

3x2 

3x2 

• 

•  • 

• 

•  • 

• 

•  • 

9fu(20 

K 


dXt: 


(36) 


Finally,  consider  the  N  X  M  matrix  function  F(x,^  defined  by 
{  }^.  =  fj(x,t^)  ,  i=l,2,...,N,  j=l,2,...,M. 

3P(x,^ 

Defining  D(F)  =  — -  ,  the  resulting  derivative  will  be  an  NXMxK 

tensor  (three-dimensional  array)  defined  by 

(“Wjijk'  i  =  . i  =  . 

^  k=l,2,...,K.  (37) 

Note  that  this  derivative  tensor  can  be  viewed  as  a  series  of  partial 
derivative  matrices,  or  ’slabs’,  each  corresponding  to  differentiation 
with  respect  to  one  of  the  variables  in  the  vector.  This  type  of 
differentiation  is  described  in  [3]  and  is  called  the  Frechet  derivative 
of  a  mapping. 


3.4  Differentiation  of  the  Squared  Norm 


The  squared  norm  of  a  vector  function  f (3^  is  a  scalar  function 
7(^  defined  by 


(38) 


(39) 


20 


Differentiation  of  this  with  respect  to  x  thus  yields 


9700 

9x 


9700  97  (x) 


9x. 


9x, 


97(20 

axj 


where 


97(20  ^  _  9f^00 

“a -  =  2  ^2  f  •  (2O  — a - 

9Xk  2  ^  9x, 


3=1  -  ■  k 

Substituting  (41)  into  (40)  and  simplifying,  we  obtain 

M 


97(20 

9x 


2  T.  fjCi) 

j=i  J 


9x, 


M  9f.(x) 

2  f  j  (2O  gx 

j=l  J  °*2 


2  H  fj(x)  —g^ 

3=1  ^ 


9f^(20 
X 


9^00 

=  2-5^  f(x) 


(40) 


(41) 


(42) 


3.6  Gradient  of  the  Least-Squares  Functional 

Recall  the  Least-Squares  Functional  (LSF)  given  by 

^(2»£)  =  1 1  «(2>£)  1 1 

where  2  is  the  N-vector  of  observations, 

a  is  the  M-vector  of  linear  parameters, 

6  is  the  K-vector  of  nonlinear  parameters,  and 
F(£)  is  the  N  X  U  basis  function  matrix. 
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T  T  T 

By  paxtitioning  the  overall  parameter  vector  as  [a  ,£  ]  ,  we  may 
write  the  gradient  function  in  partitioned  form  as 


da 

9^(2, £) 
dd 


(43) 


A  critical  point  of  ^(a,£)  is  found  by  evaluating  V^(a,5)=0,  which,  in 
general,  requires  the  simultaneous  satisfaction  of 


and 


3a 

3^(a,fi) 

Q6 


=  0 


=  0. 


(44) 


(45) 


Let  us  focus,  for  the  moment,  on  the  evaluation  of  (44).  Applying 
(42)  to  (32) ,  we  obtain 
9^(a,£) 


IT-  =  2  { ^  [  1  -  '(2)  5  ]^ }  [  I  -  n«)  i  ] 

'  -2  { -ir  [  ^  2 1 

=  -2  [  I  -  F(9)  a  ] 


(46) 


(47) 


3a" 


where  it  has  been  noted  that 

8F^(«) 

-er-  =  o  “sr 

Equating  (47)  to  zero  and  rearranging  yields 


=  I. 


f'^(£)  P(£)  a  =  P^(f)  z 


(48) 


which,  for  a  given  0,  represent  the  linear  least-squares  normal  equations 
in  which  the  observation  vector  z  projected  onto  the  range  of  the 
basis  function  matrix  to  obtain  the  vector  a.  This  result  will  be 


utilized  in  the  next  section  in  developing  a  cost  functional  for  the 
nonlinear  parameters  independent  of  the  linear  parameters. 
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3.6  The  Variable  Projection  Ftmctional 


In  the  previous  section,  it  was  shown  that  differentiating  the  least- 
squares  functional  with  respect  to  the  linear  parameter  vector  and 
evaluating  at  zero  led  to  the  linear  least-squares  normal  equations  for 
a.  The  Gauss-Uarkov  Theorem  [6]  demonstrates  that  the  Moore-Penrose 
generalized  inverse,  or  pseudoinverse,  provides  the  minimum  vsiriance 
solution  to  the  linear  least-squares  problem.  Thus,  given  a  maximum 
likelihood  estimator  for  the  nonlinear  parameter  vector,  we  may  write  the 
maximum  likelihood  estimator  for  the  linear  parameters  as 


(49) 


# 

where  F”  denotes  the  Moore-Penrose  generalized  inverse. 

By  substituting  a^^  back  into  the  LSF,  we  are  able  to  transform  the 
minimization  problem  into  one  in  which  we  first  minimize  with  respect  to 
the  nonlinear  parameters,  and  then  solve  for  the  lineair  parameters  as  a 
linear  least-squares  problem.  This  technique  leads  to  the  Variable 
Projection  Functional  (VPF)  which  is  defined  as 


(50) 

2 

II  I-F(9)  I  II 

(51) 

II  1  1 2 

11  (I-  V  i 

(52) 

111  119 

1  ^F  I 

(53) 

Here  P-  =  F(£)  F(£)  is  the  projection  operator  onto  the  column  space 

1 

of  the  basis  function  matrix  and  Pp  =  I  -  Pp  is  the  projection 
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operator  onto  the  orthogonal  complement  of  the  column  space  of  the  basis 
function  matrix. 

The  argtiment  just  laid  out  is  the  key  to  the  variable  projection 
method.  Golub  and  Pereyra  [3]  provide  a  proof  that  minimization  using 
the  variable  projection  method  leads  to  the  same  critical  point  as  'would 
the  traditional  least-squares  solution  technique  in  which  the  LSF  is 
minimized  with  respect  to  all  parameters  simultaneously. 

3.7  The  Derivative  of  the  Projection  Operator 

In  developing  the  gradient  for  the  Variable  Projection  Functional 
(VPF) ,  an  expression  for  the  derivative  of  the  projection  operator  with 
respect  to  the  nonlinear  parameter  vector,  D(Pp),  will  be  needed.  It  is 
also  useful  in  itself  when  minimizing  the  VPF  using  the  Gauss-Ne'wton 
iterative  scheme. 

It  is  important  to  note  prior  to  this  development  that  although  the 
generalized  inverse  (g-inverse)  arose  in  our  application  from  the  linear 
least-squares  normal  equations,  and  thus  implied  the  Uoore-Penrose 
pseudoinverse,  the  formation  of  the  VPF  and  its  derivative  require  a 
g-inverse  suitable  for  forming  the  projection  operator  only,  thus  this 
g-inverse  need  satisfy  only  (54)  -  (56)  below.  A  heuristic  argximent  for 
this  is  that  projecting  a  vector  onto  the  column  space  of  a  matrix  is  a 
simpler  task  than  finding  the  minimum  norm  linear  least-squaires  solution, 
thus  the  requirements  upon  the  g-inverse  can  be  less  stringent.  A  more 
formal  treatment  of  this  matter  will  be  provided  in  the  next  chapter, 
while  the  present  discussion  will  proceed  with  (54)  -  (58)  as  assertions. 
The  first  three  assertions  are  properties  of  g-inverses,  (57)  is  an 
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expression  for  the  projection  operator,  and  (58)  is  the  product  rule  of 
differentiation.  The  symbol  F*  mill  denote  the  g-inverse  of  the  matrix  F 
throughout  the  discussion. 


F  F  F  =  F 

[  F  F^  =  F  F^ 

F""  F  f"’  =  F"" 

Pp  =  F  F" 

D(AB)  =  D(A)  B  +  A  D(B) 

Combining  (54)  and  (57)  and  then  applying  eq.  (58) ,  we  obtain 

D(F)  =  D(PpF) 

=  D(Pp)  F  +  Pp  D(F)  . 

Rearranging  yields 

D(Pp)  P  =  D(P)  -PpD(P), 

and  recalling  that 


we  see  that 


1 

Pp  =I-^F' 
1 


D(Pp)F  =  Pp  D(F). 

Postmultiplying  by  F^  then  yields 

D(Pp)  Pp  =  D(Pp)  P  F^ 

=  Pp**"  D(P)  F^. 

Transposing  the  left-hand  side  of  this  equation  yields 

[  =  'f''  [  ]^- 


If  we  now  partition  D(Pp)  as 


D(Pp)  = 


9P,r  I  SPp  I 


ad. 


aPr 


ad. 


I  •••  I 


ad 


K  J 


(54) 

(55) 

(56) 

(57) 

(58) 


(59) 


(60) 
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then  transposition  within  the  derivative  tensor  is  equivalent  to 
transposition  within  each  of  the  partial  derivative  ’slabs’  (see  section 
3.3)  shown  in  the  above  partition.  Thus 


Now  noting  the  symmetry  of  the  projection  operator  and  its  pairtial 
derivative  (the  projection  operator  is,  by  definition,  symmetric  and 
idempotent)  we  have  shown  that 

[  D(Py)  ]’'  =  D(Pp)  and  [  D(Pp)  Pp  ]’'  =  Pp  I)(Pp).  (61) 

Now  noting  that  the  projection  operator  is  idempotent,  we  write 

(Pp)“=  Pp. 

Substituting  this  into  (58) ,  we  obtain 

D(Pp)  =  D(Pp  Pp)  =  D(Pp)  Pp  .  Pp  D(Pp) 

which,  after  applying  (60)  and  (61) ,  becomes 

D(Pp)  =  Pp-^  D(P)  P^  d.  [  D(Pp)  Pp  y.  (62) 

Again  using  (60)  in  the  rightmost  term,  we  obutain  an  expression  for  the 
derivative  of  the  projection  operator  which  can  be  evaluated, 

D(F)  (63) 

S.8  The  Gradient  of  the  Variable  Projection  Functional 

Armed  with  the  derivative  of  the  projection  operator,  we  may  now  form 
an  expression  for  the  gradient  of  the  Variable  Projection  Functional 
(VPF). 

Recall  the  definition  of  the  VPF,  given  as 


2 
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Applying  (42) ,  i . e .  taking  the  derivative  of  the  VPF  -with  respect  to  the 
nonlinear  patrameter  vector,  ve  obtain  the  gradient  of  the  YPF  as  follows: 

2  -ll  {  [  I  I’'  }  i 
--li  D'p^ 

-  2  K  {  [  ]^  }  fp'^  I  • 

Noting  that 

•we  obtain  i 

I  =  -  -L  BCTp)  Tp  I  ■  W 

If  we  now  substitute  (63)  for  the  derivative  of  the  projector,  we  get 

I  ^*2^  =  -  {  fp^  *  [  fp'''  Hf)  p*  fp'^  I 

-  -  Pp^  I>(P)  P*  Pp'*'  I  -  [  Pp'^  B(P)  pT  ^p'^  i 

=  -  Pp'*’  B(P)  P*  Pp'^  I  -  1  "CP’^)  Pp'^  I  •  (65) 

In  arriving  at  (65) ,  we  recognized  that  Pp  is  symmetric  and 
idempotent,  therefore 


Now  noting  that 

p*  Pp  -  p*  [  *  -  ^  ®- 

we  see  that  the  first  term  in  the  gradient  becomes  zero,  leaving 

5  '1*2^  '  -  (P*)^  “(p’^)  Pp^  I 

=  -  [  l’’  Pp'*'  PCP)  r*  I  ]''• 


(66) 
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Equations  (63)  and  (66)  proyide  the  ability  to  use  any  of  the 
graidient  oininization  techniques,  which  will  be  stimmarized  as  part  of 
Chapter  5,  as  well  as  the  variable  metric  techniques,  in  solving  the 
variable  projection  nonlinear  least-squares  problem. 
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4.  LINEAR  LEAST-SQUARES  THEORY 

As  is  clear  from  the  preceeding  chapter,  linear  least-squares  theory, 
particularly  the  concepts  of  the  generalized  inverse  and  the  orthogonal 
projection  operator,  are  fundamental  to  general  least-squares  theory. 

The  purpose  of  this  chapter  is  two-fold.  The  first  objective  is  to 
review  in  some  detail  the  characteristics  of  the  projection  operator  and 
the  generalized  inverse.  The  second  objective  is  to  summarize  what 
works,  what  doesn’t,  2uid  when,  as  the  popular  QR  factorizations  are 
applied  to  forming  projection  operators  and  solving  linear  least-squares 
problems . 

Section  4.1  focuses  on  the  orthogonal  projection  operator.  Here, 
following  the  the  work  of  Halmos  [13],  the  properties  of  the  projection 
operator,  namely  that  it  is  idempotent  and  symmetric,  will  be  discussed. 
The  eigenstructure  of  this  operator  is  also  examined. 

Sections  4.2  through  4.4  examine  the  properties  of  the  generalized 
inverse  for  finding  solutions  to  consistent  equations  and  lineeu:  least- 
squares  problems  and  finding  minimum  norm  solutions  to  consistent 
equations.  The  Moore-Penrose  pseudoinverse,  which  leads  to  the  minimum 
norm  least-squares  solution,  will  then  be  examined. 

Sections  4.6  through  4.9  look  at  the  ^R  factorization,  which  is  seen 
in  Section  4.6  to  lead  to  a  generalized  inverse  which  is  adequate  for 
solving  full  rank  linear  least-squares  problems.  Sections  4.7  and  4.8 
look  at  the  rank  deficient  case,  and  give  results  for  what  we  will  call 
the  truncated  QR  factorization  and  for  the  complete  orthogonal 
factorization  as  outlined  by  Hanson  and  Lawson  [14] ,  Golub  and  Pereyra 
[15] ,  and  Golub  and  Van  Loan  [16] .  Here  it  is  seen  that  while  the  g- 
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inverse  formed  using  the  truncated  factorization  does  not  lead  to  the 
Uoore-Penrose  pseudoinverse,  the  complete  orthogonal  factorization  does 
achieve  the  minimum  norm  solution,  and  thus  provides  an  alternative  to 
the  singular  value  decomposition  for  performing  the  necessary  rank 
reduction.  Finally,  the  application  of  the  complete  orthogonal 
factorization  to  nearly  rank  deficient  matrices  is  discussed  breifly  in 
section  4.9. 


4.1  The  Projection  Operator 


N  N 

Consider  the  N-dimensional  space  E  and  a  linear  subspace  S  C  E  . 

There  exists  a  linear  subspace  S  ,  called  the  orthogonal  complement  of 

S  in  E^,  such  that  E^  is  the  direct  sum  of  S  and  S  . 

Drawing  from  the  work  of  Halmos  [13] ,  the  following  definitions  and 

equations  (67)  through  (73)  characterize  the  projection  operators. 

Definition  There  exists  an  operator,  Pg,  called  the  projector  onto  S, 

which  maps  every  vector  in  E^  onto  S. 

Definition  There  exists  an  operator,  Pn  ,  called  the  projector  onto 

®  1 

the  complement  of  S  in  E  ,  which  maps  every  vector  in  E‘'  onto  S 


Furthermore , 

Now  consider  the  vectors 

X  C  S 

where 


^  ■  ^S' 


I  C  s 


(67) 


z  CeN, 


2  =  *1  +  22' 


1 


Zj^  C  S  and  Zg  C  S 


The  projection  operators  defined  above  satisfy  the  following  six 


relationships: 
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1 


CO 

IX 

II 

IX 

(68) 

° 

1 

Psi  =  o 

(70) 

31  =  1 

1 

(72) 

Ps  z  =  z. 

(69) 

(71) 

(73) 


From  (68)  and  (71),  the  projection  operator  is  easily  shorn  to  be 
idempotent . 

a 

therefore 


s  s)  =  1*8 


p  ^  =  p 


Also,  «e  see  from  (70)  and  (73)  that 

1 


(74) 


so  that 


1 


P«  ?a  =  0. 


Pg  (Pg  2)  =  0 

Sifflilairly,  from  (69)  and  (72),  ve  see  that 

Pg^  (Pg  z)  =0  so  that  Pg"^  *g 
Finally,  from  (72)  and  (73),  and  from  the  definition  of  orthogonality 

(Pg  s)^  (Pg  »)  =  0. 

Substituting  (67)  and  transposing  within  the  parantheses 

Pg^  (I  -  Pg)  Z  =  0. 

Since  this  is  true  for  all  z  in  E^, 


(75) 


p  T  p  T  p 

S  S  S' 


(76) 


The  right  hand  side  is  seen,  by  inspection,  to  be  symmetric.  Therefore 
the  projection  operator  must  also  satisfy 

T 


^s  =  Ps 


1 


(77) 


In  summary,  the  projection  operators  Pg  and  Pg  are  idempotent. 


symmetric,  and  mutually  annihilating. 
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Further  insight  can  be  gained  by  examining  the  eigenstructure  of  the 

projector.  Since  the  subspace  S  is  invariant  under  the  trajisformation 

Pg,  it  is  known  that  S  is  spanned  by  some  set  of  the  eigenvectors  of  Pg 

which  correspond  to  a  multiple  unity  eigenvalue  [13] .  The  remaining 

eigenvectors  span  the  complement  of  S  in  and  correspond  to  eigenvalues 

N 

of  zero.  The  eigenvectors  of  Pg  will  also  form  a  basis  for  R  . 

To  see  that  this  eigenstructure  exemplifies  the  operation  of  the 
projector,  consider  the  arbitrary  N-vector  z  in  the  space  R^,  which  has  a 
set  of  basis  vectors  u^,  i=l,...,N.  We  may  choose  this  set  of  basis 
vectors  to  be  the  eigenvectors  of  the  N  X  N  matrix  Pg,  which  is  the 
projector  onto  the  (say,  U-dimensional)  subspace  S.  We  may  now  write 

Hi*  (78) 

aind 

z  =  aj^  Uj^  +  ag  Ug  +  •••  +  ajj  Ujj  (79) 

thus 

Pg  z  =  Xi  a^  u^  +  Xg  a2  U2  +  •••  +  Xj^  a^^  Uj^.  (80) 

y  of  the  N  eigenvalues,  those  corresponding  to  the  eigenvectors  which 
span  the  subspace  S,  will  have  value  unity,  while  the  remaining 
eigenvalues  will  have  a  value  of  zero.  We  therefore  have  the  result 

y 

Pg  z  =  ^  a.  u.  (81) 

i=l  ^i  h 

where  the  indices  j^  denote  those  eigenvectors  which  span  S. 


4.2  The  Generalised  Inverse 


In  general,  an  N  X  y  matrix  A  is  a  linear  transformation  which  maps 
an  arbitrary  vector  x  from  an  y-dimensional  space  to  an  N-dimensional 
space  which  is  the  range  (column  space)  of  the  mapping  (matrix) .  We 
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desire  an  inverse  transformation  which  will  map  an  N-vector  lying  in 
the  range  of  A  back  into  an  U-dimensional  space.  If  the  vector  does  not 
lie  in  the  range  of  A,  then  the  inverse  mapping  must  first  approximate  ]r 
with  a  suitable  vector  which  is  in  the  range  of  the  mapping.  Let  us  first 
consider  the  case  where  ^  does  lie  in  the  column  space  of  A  (consistent 
equations) . 

For  the  N  x  M  matrix  A,  the  M  X  N  matrix  A*  is  a  generalized  inverse 
(g-inverse)  of  A  if 

X  =  A'*’  2 

is  a  solution  to  the  equation 


In  general,  this  requires  that 

A  A""  A  =  A.  (82) 

In  the  most  unrestricted  sense,  this  is  all  that  is  required  of  a  g- 
inverse.  If  we  wish,  however,  to  consider  the  case  of  inconsistent 
equations,  then  we  must  impose  further  restrictions  which  determine  how 
we  wish  to  approximate  ^  before  transforming. 
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4.3  g-inverse  for  Linear  Least-Squares  Solution 

Geometrically,  the  best  approximant  to  ^  which  makes  the  system 
consistent  is  the  projection  of  ^  onto  the  column  space  of  A.  Thus  for 
the  arbitrary  vector  z,  the  inconsistent  equation 

A.  X  -  z 

can  be  made  consistent  by  premultiplying  both  sides  by  the  projection 
operator  for  the  column  space  of  A,  yielding 

^A  ^  ^A  -• 

But  the  projection  of  the  columns  of  A  onto  themselves  leaves  them 
unaffected,  so  this  reduces  to 

A  X  =  z.  (83) 

The  g-inverse  solution  for  x  is  then 

X  =  A*  z. 

If  we  substitute  this  into  (83)  and  note,  on  the  right  hand  side  of  (83), 
that  the  projection  operator  is  idempotent,  this  becomes 

A  A*  P^  7  =  P^  P^  z. 

From  this,  ire  see  that  the  generalized  inverse  for  solving  linear  least- 
squares  problems  must  be  such  that 

=  A  A* 

is,  in  fact,  the  projector  onto  the  columns  space  of  A.  This  then 
requires  that  the  product  A  A^  be  idempotent  and  symmetric.  That  this  is 
idempotent  folloirs  from  (82),  so  that  no  neir  restriction  is  imposed.  We 
do  have  the  further  restriction,  though,  that  A^  must  satisfy 

[A  A'"]’^  =  A  A"". 


(84) 
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4.4  g-inverse  for  Miniotm  Nora  Solution 


We  know  from  the  Section  4.2  that  the  g-inverse  which  solves  the 
consistent  equation 


must  satisfy 


A  X  =  ][ 

A  A  =  A. 


(85) 


From  this,  it  follows  that 

k  -  A.  A*  k  =  0, 

thus 

A  [I  -  A"^  A]  =  0. 

We  can  therefore  state  that  for  any  z, 

A  [I  -  A'"  A]  z  (86) 

is  a  solution  to  the  homogeneous  equation 

A  X  =  0. 

Analogously  to  the  linear  differential  equatian,  the  general  solution  for 
the  set  of  simultaneous  linear  equations  in  (85)  is  the  sum  of  the 
homogeneous  solution  and  a  particular  solution,  and  can  thus  be  written 

X  =  A*2^  +  [I  -  A^  A]  z. 


Denote  the  g-inverse  which  leads  to  the  minimum  norm  solution  as  A*. 

m 

We  desire  then  that 


1  * 


A*  A 


for  all  2 


(87) 


A*  2.  +  [i-A*a]  z  jj^ 


Note  that 
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Thus 


This  is  a  minifflum  when  the  two  middle  terms  are  zero,  which  occurs  when 
the  particular  solution  is  orthogonal  to  the  homogeneous  solution.  In 
general,  this  requires  that 

[I  -  A""  A]  =  0. 

Thus 

[A^]*^  =  [A""]^  A*"  A. 

For  this  to  be  true,  it  is  necessary  and  sufficient  [17]  that 

A"^  A  A""  =  A""  (88) 

2Lnd 

[A"^  A]^  =  A""  A.  (89) 

From  (88)  and  (89) ,  we  see  that  the  product  A^  A  is  both  idempotent 
and  symmetric  and  is  thus  a  projection  operator.  We  will  now  show  that 
it  is,  in  fact,  the  projector  onto  the  row  space  of  A. 

A*  A  =  [A*  A]^ 

=  A^  [A^l*. 

T 

which  is  the  projection  operator  onto  the  columns  of  A  ,  which  are  the 
rows  of  A. 

In  summary,  the  g-inverse  for  obtaining  a  minimum  norm  solution  to 

A  X  = 

must  be  such  that 

aP  =  =  A*  A 

is  the  projection  operator  onto  the  row  space  of  A. 


(90) 
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Let  us  now  re-examine  the  general  solution,  now  written 

X  =  A""  jr  +  [I  -  ^P]  z 

where  we  have  substituted  (90)  into  the  homogeneous  solution.  Recall 
from  Section  4.1  that  [I  -  P]  is  the  projector  onto  the  complement  of  the 
subspace  for  which  P  is  the  projector;  thus  we  see  that  the  homogeneous 
solution  is  confined  to  the  null  space  of  A.  Since  the  minimum  norm 
solution  must  be  orthogonal  to  this  homogeneous  solution,  what  we  are 
really  striving  for  in  the  minimum  norm  solution  is  that  solution  which 
lies  in  the  row  space  of  A. 

4.5  g-inverse  for  Minimum  Norm  Least-squares  Solution 

Combining  the  results  of  the  last  two  sections,  we  see  that  the 
g-inverse  for  obtaining  the  minimum  norm  least-squares  solution,  i.e.  the 
Moore-Penrose  generalized  inverse  (pseudoinverse) ,  must  be  such  that 

=  A  A* 

and 

aP  -  A*  A 

are,  respectively,  the  orthogonal  projectors  onto  the  column  space  and 
the  row  space  of  OA,  This  is  equivalent  to  the  following; 

A  A*  A  =  A 
[A  A""]^  =  A  A'" 

A*  k  A*  =  A* 

[A""  A]**^  =  A""  A. 

It  is  interesting  to  note  that  in  forming  the  minimum  norm  linear 
least-squ2U'es  solution,  we  are  actually  performing  a  three-stage  process. 
Starting  with  the  least-squares  problem 
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A  X  =  I  (91) 

vhere  A  is  not  necessarily  full  rank,  we  obtain  the  minimum  norm  solution 

x  =  (92) 

Stage  I:  Projection  of  ^  onto  the  column  space  of  A  to  obtain  a 
consistent  set  of  equations.  This  can  be  shown  explicitly  by 
substituting  (92)  into  (91)  above  to  yield 

i  =  A  A*  I 

Stage  II:  Solution  to  the  consistent  set  of  equations 

a  A  A 

A  X  =  £ 

to  yield  the  general  solution 

X  =  A"^  £  +  [I  -  A"^  A]  z, 
where,  z  is  an  aurbitraxy  vector  in 

Stage  III:  Projection  of  x  onto  the  row  space  of  A  to  obtain  the  minimum 
norm  solution  (eliminate  the  homogeneous  part  of  the  solution) .  This  can 
be  shown  explicitly  by  substiting  (91)  into  (92)  above  to  yield 

X  =  k*  k  it 


With  this,  we  conclude  our  formal  discussion  of  linear  least-squares 
theory.  The  remainder  of  this  chapter  will  examine  the  QR  factorization 
family  as  they  are  used  for  forming  projection  operators  and  solving 
linear  least-squares  problems.  In  particular,  we  will  look  at  how  well 
the  g-inverses  constructed  with  these  factorizations  conform  to  to  the 
equations  outlined  in  this  section. 
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4.6  QR  Factorisation  of  Full  Rank  Matrices 


is  square,  upper  triangular,  and  nonsingular. 

With  this,  ire  may  vrite 

A  =  R, 

and  define  a  g- inverse  of  A  an 

I  ® 

Recalling  that  =  A  A^,  the  projection  operator  becomeas 


where  is  the  M  X  M  identity  matrix.  We  see  by  inspection  that  this  is 
symmetric,  and  by  squaring  we  see  that  it  is  idempotent. 
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where  we  have  used  the  orthogonality  of  matrix  Q.  Thus  we  see  that  the 
g-inverse  defined  in  (94)  is  adequate  for  forming  the  projection  operator 
onto  the  column  space  of  A. 

Now  recall  the  projection  operator  onto  the  row  space  of  A, 

=  i*  A 

Substituting  (94)  into  this  equation  yields 


iP  =  [  I  0  ]  «  d’ 


Thus  this  factorization  satisfies  the  requirements  for  the  Uoore-Penrose 
g-inverse.  The  least-squares  functional  for  the  linear  least-squares 
problem  A  x  -  b  then  becomes 

il  Ai-k  112 


min 

X 

mm 

min 

X 


(I  A  X  -  q  b 

II  »  X  -  q  b  ''2 


min 

X 


■  Jl' 

X  - 

A 

b 

.  0  . 

dj  ■ 

Here,  q  has  been  partitioned  as  q  = 


h 


)“ 

}  N-M 


(96) 


The  solution  for  x  in 


-1 


in  this  case  is  determined  uniquely  as  x^^g  ~ 
leaving  a  residual  sum  of  squared  error 

(•tiLs)  =  II  «2i  ll’^-  W 
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4.7  QR  Factorisation  of  Rank  Deficient  Matrices 


In  the  case  of  rank  deficient  matrices,  the  QR  factorization  does  not 
lead  to  a  g-inverse  which  satisfies  the  Moore-Fenrose  conditions.  In  this 
section,  it  is  shown  that  a  truncated  version  of  the  (IR  factorization 
with  column  pivotting  can,  however,  be  used  to  construct  a  g-inverse 
suitable  for  forming  projection  operators.  In  the  next  section,  the 
complete  orthogonal  factorization  will  be  presented,  which  solves  the 
problem  of  rank  degeneracy  and  does  lesul  to  the  Uoore-Penrose 
pseudoinverse . 

Consider  the  N  X  M  matrix  A  of  rank  r  <  M  i  N.  There  exists  an  N  X  N 
orthogonal  matrix  and  an  M  X  M  permutation  matrix  S,  such  that 


q  A  S  =  R  2 


where 


*11  = 


By  truncating  R,  i.e.  replacing  R-„  by  a  zero  matrix,  a  g-inverse  of  A  is 


A'"  =  S 


For  this  factorization,  the  projector  onto  the  range  of  A  becomes 


(100) 


P.  = 


I  ^12 
0  I  0 


(101) 
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which,  as  in  the  full  rank  case,  conforms  to  the  requirements  of  a 
projection  operator,  thus  this  g-inverse  is  suitable  if  the  formation  of 
the  projector  for  the  column  space  is  all  that  is  required.  As  we  shall 
now  see,  however,  the  product  A  does  not  form  a  suitable  projection 
operator  with  this  g-inverse. 


k*  k  =  S 


R 

*11 

0 

a  a^^ 

[»u 

*12] 

0 

0  . 

.  0 

0  . 

(102) 


While  this  is  not  symmetric  and  therefore  cannot  be  used  as  a  projector, 
it  is  interesting  to  note  that  this  factorization  does  satisfy  the  third 
requirement  of  the  Moore-Penrose  pseudoinverse,  namely  that 

k"'  k  k*  =  A. 

»  -1  I  . 


k*  k  k*  =  k*  =  S 


«  12 


s 


‘11 


=  s 


‘u' 


0  J 


«  =  A^ 


Note  also  that  this  g-inverse  satisfied  all  of  the  conditions  for 
forming  the  derivative  of  the  projection  operator  in  Chapter  2,  thus  it 
is  suitable  for  use  in  minimizing  the  variable  projection  functional  even 
when  the  basis  function  matrix  is  rank  deficient. 


4.8  Complete  Orthogonal  Factorization  of  Sank  Deficient  Matrices 

In  this  section,  it  is  shown  that  an  extension  of  the  as 
factorization,  the  complete  orthogonal  factorization,  is  a  suitable 
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alternative  to  the  singular  value  decomposition  for  performing  the  rank 
reduction  necessary  to  obtain  the  minimum  norm  linear  least-squares 
solution. 


Again  consider  the  N  X  M  matrix  A  with  rank  r  <  U  ^  N  and  the 


orthogonal  factorization 


A  S  =  S  S 


^11  ^12 

0  I  0 


where 


*11  =  oV 


There  exists  an  U  X  M  orthogonal  matrix  Y  such  that 


RV=qASV  =  l*5 


I 

0  I  0 


(103) 


where 


’  IV/1 

‘ii  =  X'  ■ 

NjrXr 


From  this,  we  may  write  A  =  %  S  VS,  and  define  the  g-inverse 

r  *  -1  I 

*11  ® 

A+  =  s  Y  - —  q 


0  0 


Now  forming  the  projection  operator  P^,  we  obtain 
P.  =  ^  1  y’^s^  S  Y  — |- 


0  I  0 


(104) 


(105) 


which  is,  once  again,  seen  to  be  symmetric  and  idempotent.  If  we  now 


attempt  to  form  the  projector  onto  the  row  space  of  A,  we  get 
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.P  =  S  V 


=  S  V 


-1 


‘11 


0  J 


hi  '  ® 


0  J 


vV 


(106) 


which  is  syonetric  and  idempotent.  Thus  the  complete  orthogonal 
factorization  leads  to  a  g-inverse  which  is  suitable  for  forming  both 
projection  operators  and  minimum  norm  least-squares  solutions. 

With  this  g-inverse,  the  least-squares  functional  becomes 
min  ^  min  1 1  .  „  ,.112 

X  X 


A  X  -  b 

=  T  II  ll"" 

=  II  dASS^-tb  ||2 

II  dAS  V  vVx  -  db  112 
1 1  i’tV  X  -  d  b 


min 

X 


If  we  let  Z  =  S'*^  X,  partition  %  as  ^  = 
V  as  V  =  [Vj^  I  Vg  ] ,  we  then  obtain 


,  and  partition 


1  1 


min 

I 


‘11 


»  q* 

R 

jnl’l 


Lfl 

1 

[Vl 

T 

1  - 

Ji' 

b 

1  0  J 

Tg  J 

<2  ■ 

I  - 


[T  T  It 

^2  J  '  obtain  the  solution 

^  *11  '1 


ii  =  T,  i;,-2  d,  b. 


(107) 
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We  caun  now  obtain  a  minimum  norm  solution  by  letting  £2  ~  yielding 


-LS  “  ^ 


’  -1 


(108) 


and,  as  in  the  full  rauik  case,  leaving  a  residual  sum  of  squares 

12 


itCiLs)  =  II 


(109) 


4.9  Near  Rank  Deficient  Vatricies 


Consider  the  N  X  U  matrix  A  with  numerical  rank  /)  =  M  ^  N,  but  whose 
expected  (ideal  rank)  is  r  ^  M.  There  exists  an  N  X  N  orthogonal  matrix, 
and  an  M  X  U  permutation  matrix,  S,  such  that 

R, 


A  S  =  R  5 


(110) 


where 


*1  = 


MXM 


Column  pivotting  at  each  stage  of  the  factorization  will  result  in  a 
matrix  which  cam  be  further  partitioned  as 


»i- 


*11 

*12 

0 

*22 

where 


‘irf^ 


rXr 


and 


‘22= 

I  “  \l(l(-r)x(ll-r) 
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If  A  were  truly  rank  deficient,  R22  consist  of  zeros.  But  because 

of  perturbations  in  A,  E22  have  non-zero  elements.  If  the 

perturbations  are  small,  however,  then  the  elements  of  R22  should  also  be 
small,  so  that  the  rank  deficiency  can  be  uncovered  when  | 1^221 ^  becomes 
much  smaller  than  |jA||.  Then  rank  reduction  can  be  achieved  by  setting 
^22  zero  and  solving  the  remainder  of  the  problem  as  a  truly  rank 
deficient  case. 

Golub  and  Van  Loan  [16]  point  out  that  there  are  cases  in  which  at  no 
step  during  the  orthogonalization  procefss  is  the  norm  of  ^22  small, 

even  though  the  oriuginal  matrix  is  rank  deficient.  But  they  also  go  on 
to  say  that  this  method  of  rank  deternmination  "works  well  in  practice . * 
The  reader  is  referred  to  Section  6.4  of  Golub  and  Van  Loan  [16] ,  and  to 
Golub,  Klema,  and  Stewart  [18] . 
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5.  ALGORITHM 


The  purpose  of  this  chapter  is  to  develop  an  algorithm  for  the 
maximum  likelihood  parameter  estimation  technique  which  utilizes  the 
variable  projection  method.  Following  Bard  [5],  Sections  5.1  through  5.4 
review  the  gradient  methods  for  iterative  minimization.  This  review  will 
culminate  in  a  discussion  of  the  Gauss-Newton  method  applied  to 
minimization  of  a  squared  norm  function  (inclusive  of  the  least-squares 
auid  the  variable  projection  functionals).  Section  5.5  will  then  utilize 
the  results  of  Chapter  3  to  formulate  the  Gauss-Newton  step  for  the 
Variable  Projection  Nonlinear  Least-squares  method.  Following  this,  a 
simplification  to  the  algorithm  noted  by  Kaufman  [19]  will  be  reviewed. 
Finally,  Marquardt’s  modification  to  the  Gauss-Newton  step  will  be 
discussed. 


5.1  Iterative  Minimization  Techniques 


Given  an  objective  functional  ^(£)  of  the  vector  of  parameters 

-  ~  [  ^1»  ^2’  * "  ’  ]^’ 

we  wish  to  determine  the  values  for  5  such  ^(£)  is  minimized.  Iterative 
minimization  techniques  [5]  generate  a  sequences  of  vectors  6^,  i=l,2, ... 
which  hopefully  converge  to  the  true  minimum  of  the  objective  function. 
The  vector  6^^  is  called  the  i’th  iterate. 

Let  us  define 


A. 

-1 


=  2i.i  -  k 

=  the  i’th  update  step 


(111) 


and 
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=  ^(£i)- 

Definition;  The  i’th  iterate  is  acceptable  if  that  is,  if  the 

addition  of  the  i’th  update  step  to  the  i’th  iterate  causes  a  decrease  in 
the  value  of  the  objective  function. 

Each  iteration  consists  of  determining 

(1)  a  vector  d^  in  the  direction  of  the  i’th  update  step,  and 

(2)  a  scalr  such  that  the  step  =  p^  d^  produces  an  acceptable 
iteration.  Thus  we  require  that  ^(£^+4^)  <  ^(4^)* 

5.2  Gradient  Methods  for  Determining  Step  Direction 


During  the  i’th  iteration,  we  strike  out  from  0^  along  a  direction  d 
generating  the  ray 

i(,P)  =  (112) 

Here  we  have  noted  that,  when  confined  to  this  ray,  9,  and  hence  ^(0), 
are  functions  of  p  alone.  We  may  now  define  the  confined  objective 
function  as 


fS>- 


(113) 


Differentiating  this  with  respect  to  p  yields 

3«id 


9/» 


■  3^  • 

T 

■  Q9  ■ 

df  ■ 

•  9£  ■ 

ep  . 

•  9£  • 

d, 


and  evaluating  at  p=Q  yields  the  directional  derivative  of 
d  at  6.,  defined  as 

-  -i’ 


(114) 
relative  to 


’id 


9»id 


Qp 


p=0 


T  A 
=  fii  d. 


(115) 


Here,  is  the  gradient  of  ^  evaluated  at 
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Qm  I 

Si  "  dd  \i=i^  ‘ 

A  small  positive  value  of  p  is  guaranteed  to  produce  a  step  which 
decresises  the  value  of  the  objective  function  if  the  directional 
derivative  at  6^^  is  negative.  Thus  we  may  define  d  as  an  acceptable 
direction  if  4  <0.  This  simply  states  that  d  is  a  downhill  direction 

on  the  contour  of  4i'  if  it  forms  a  greater  than  90*  angle  with  the 
gradient  at 

One  obvious  choice  of  direction  for  the  i’th  iterate  is  simply 

d  =  -g  (117) 

This  is  the  direction  used  for  all  iterations  in  the  steepest  descent 
method,  named  for  the  fact  that  this  is  the  direction  in  which  the 
objective  function  initially  decreases  most  rapidly.  Unfortunately,  this 
often  produces  steps  which  zigzag  back  and  forth  down  the  contour, 
leading  to  extremely  slow  convergence. 

As  an  alternative,  we  may  find  an  acceptable  direction  by  finding  a 
suitable  positive  definite  matrix  K,  and  defining 

d^  =  -  R  g. .  (118) 

The  acceptability  of  this  direction  follows  from  the  definition  of 
positive  definiteness  as  follows: 

♦id’  =  sJ  ^i  =  -  fii  <  0 

Minimization  techniques  in  which  directions  are  obtained  in  this  way 
are  called  gradient  methods.  If  the  positive  definiteness  of  R  is 
strictly  adhered  to,  then  the  method  is  called  an  acceptable  gradient 
method. 
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6.3  Newton’s  Method 


It  is  well  known  from  single  variable  calculus  that  a  zero  in  a 
function  g(x)  can  be  found  iteratively  using  the  first  order  Taylor 
approximation  around  a  point  x^,  given  by 

g(x)  2  g(x^)  +  [g’(x^)]  (x-x^). 

This  approximation  can  be  extended  to  finding  a  local  minimum  of  a 
function  f(x)  by  letting  g(x)  =  f*(x), 

f>(x)  =  f>(x.)  +  [f«(x.)](x-x.). 

Equating  this  to  zero  and  rearranging,  the  i’th  iteration  becomes 

provided  f  ■  (x^^)  ^0 . 

Extending  this  to  multivariable  calculus,  we  obtain  the  update 
relation 


e.  e.  -  E. 

-1+1  -1  1 


(120) 


where  £  and  g  are  as  defined  in  the  previous  section  and  =  H(£  i)  is 

the  Hessian  matrix  evaluated  at  6..  The  Hessian  matrix  is  defined  by 

-1  ' 

r  .  mD 

I  )mn  ~  de  d9  ’ 

m  n 

Note  here  that  H^  must  be  nonsingular. 

Newton’s  method  may  be  alternatively  viewed  as  a  second  order  Taylor 
series  approximation  to  the  original  function,  given  by 

i(£>  =  Hij)  *  tj  [  ]  *  4"  1  [  2-2i  ] 


which  is  the  best  second  order  approximation  to  the  original  function. 
Differentiating,  we  obtain 
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QHD  r  1 

-Qf-  =  £i"  [^^iJ  (123) 

■which,  when  set  to  zero  yields  the  recursion 

«i*l  =  ii  -  Si-  (124) 

This  relation  satisfies  the  general  formula  for  a  gradient  method 
iteration  with  p^=l  and  Rj^=H  1.  If  H  is  positive  definite,  then  H  ^  will 
also  be  positive  definite,  and  Newton’s  method  will  produce  acceptable 
iterations.  Furthermore,  if  the  objective  function  is  quadratic,  then 

A 

^(5)  =  ^(5)  and  Ne-wton’s  method  will  converge  in  a  single  iteration. 

In  order  to  avoid  calculating  second  derivatives,  one  may  use  the 

/ 

Gauss  approximation  to  Newton’s  method,  or  the  Gauss-Newton  method,  which 
requires  only  the  evaluation  of  first  derivatives.  This  will  be  brought 
to  light  in  the  next  section. 

5.4  Gauss-Newton  Method 


Consider  an  objective  function  of  the  form 

2 

=  II  e(£)  II  (125) 

=  Z  e.^  (126) 

j=l  J 

Among  others,  this  form  includes  the  least-squares  and  variable 
projection  functionals. 

Differentiating  with  respect  to  the  m’th  component  of  the  parameter 
vector  (and  dropping  the  iteration  index  for  convenience)  yields  the  m’th 
component  of  the  gradient  vector 


m 


e. 

2 


(127) 
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Now  differentiating  this  with  respect  to  the  n’th  component  of  the 
parameter  vector  yields  the  typical  component  of  the  Hessian  matrix 


N 


{ H0 =  2  r 


3e . 

3 

3e . 

3 

35 

m 

35 

n 

N 

2  Z  e 
j=l  J 


a2 

0  e. 

J. 


dd  dd 

m  n 


(128) 


Near  a  minimum  of  ^(£)i  the  error  e.  will  be  small  and  will  make  the 
second  term  above  negligible  compared  to  the  first.  It  is  by  neglecting 
this  term  that  we  obtain  the  Gauss  approximation  to  the  Hessian  matrix, 
given  as 


N 


{ n(2)  =  2  2: 

3=1 


3e.  1 

3 

Be . 

3 

35 

m  J 

35 

n 

(129) 


Let  us  now  define  the  cost  function  derivative  matrix 

8e  I  3e  I  I  de 


B  = 


35, 


35, 


I  •••  I 


35,, 


(130) 


'1  I  ““I  I  I  “"K 
Having  thus  defined  this  derivative  matrix,  we  may  now  rewrite  both 
the  gradient  vector  and  the  Gauss  approximation  to  the  Hessian  matrix  in 
terms  of  the  derivative  matrix  as 


S  =  2  e  (131) 

and 

N  =  2  b”^  B  (132) 

If  we  now  substitute  the  Gauss  approximation  to  the  Hessian  matrix 
into  the  gradient  method  equation  for  the  step  direction,  i.e.  B  =  N~^, 
we  obtain 

d  =  -  N'^  g, 
or 

N  d  =  -  g.  (133) 

Now  substituting  in  (131)  and  (132)  above,  we  get 

B*^  B  d  =  -  B^  e. 


(134) 
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But  these  are  just  the  normal  equations  for  the  linear  least-squares 
problem  in  'which  the  error  vector  is  projected  onto  the  range  of  the 
derivative  matrix  to  obtain  the  vector  d.  Thus,  the  solution  for  d  at 
each  iteration  is  simply 

d  =  -  e,  (135) 

where  B"  is  a  Moore-Penrose  pseudoinverse  of  B. 

5.5  Variable  Projection  Nonlinear  Least-Squares  Gauss-Newton  Iteration 


This  section  will  combine  the  results  of  the  present  chapter  and 
those  of  the  previous  two  chapters  to  devise  an  algorithm  for  the 
Variable  Projection  method.  Recall  that  the  Gauss-Newton  step  is 
obtained  from 

d  =  -  e, 

where 

5e  I  9e  1  I  5e  9e 

a^l  I  9^1  I  j  95jf  g^T 


From  Chapter  3,  equation  (53),  we  have 


=  -  d(  Pp  )  Z  (136) 

Substituting  (63)  for  the  Frechet  derivative  of  the  projection  operator 


yields 
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B  =  -  Pp-*-  D(F)  T*  z  -  [  D(P)  P^  y  z- 


(137) 


Now  recall,  from  Section  4.7,  the  factorization  of  the  N  X  M  matrix  F 


given  by 


ft  F  S  = 


*11  I  *12 


where  ft  is  an  N  X  N  orthogonal  matrix  and  S  is  an  M  X  M  permutation 
matrix.  From  this  we  had  defined  the  g-inverse  of  F 


F"”  =  S 


»ir‘  I  •> 


The  projection  operator  onto  the  column  space  of  F  is  then 


Pp  =  r 


from  which  we  can  define  the  projector  onto  the  orthogonal  complement 


of  F  in  R 


P.  =  ft^ 


0  1  I 


To  simplify  notation,  let  us  define 


^1  = 


0  I  0 


I  =  - ! - 

^2  0  I  I, 


,  and 


vV 


With  these  definitions,  we  can  write  the  g-inverse  and  the  projection 
operators,  respectively,  as 


F  =  S  Ej^ 


(139) 


Pm  =  r  I, 


(140) 


X  T 
^F  ~  ^2 


(141) 


Substituting  these  definitions  into  (137)  yields 
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B  =  -  Ig  q  D(F)  s  q  z  -  (  I2  ft  d(f)  s  r^'^  q  Y  z.  (142) 

This  equation  can  be  regrouped  as  follows  to  demonstrate  how  one  might 
implement  the  formation  of  the  matrix.  First,  taking  the  transpose  of 
the  second  term  yields 

(  q'’^  I2  q  D(F)  S  R^-^  ft  Z  =  ft*”^  (  [  ft  ^2  ^  ^ 

If  we  now  let  v  =  q  jr  and  C  =  q  D(F),  and  x  =  F^  jr  we  obtain 

5.6  Kaufman’s  Variable  Projection  Algorithm 

A  much  simpler  version  of  the  projection  operator  derivative,  amd 
thus  a  simpler  version  of  (142) ,  was  derived  by  Kaufman  [19] .  By 
exploiting  the  structure  of  the  projection  operator  and  the  isometric 
properties  of  the  orthogonal  matrix  q,  Kaufman  has  shown  that  the  second 
term  on  the  right  hand  side  of  (142)  can  effectively  be  ignored. 

Noting  that  q  hats  orthonormal  columns,  we  know  that 


•  II  i  II 

=  l|l2«l||- 

Following  Kaufmaui,  we  can  define  the  new  objective  function 

#3(£)  =  II  I  II 

where  we  have  pairtitioned  q  as 
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111  }  M 

(1=  i 

h  J)  N-M 

While  the  derivative  of  ^  is  dependent  upon  the  orthogonalization 
process  in  vhich  the  matrix  ^  is  determined,  and  is  therefore  not  unique, 
Kaufman  derives  the  following  general  formula  whose  results,  though 
nonunique,  are  similar  "within  an  orthogonal  transformation”: 

D(Jl2)  =  -  ^  D(r)  Sj  q  +  Z  (I  (145) 

where 

+  Z  =  0.  (146) 

Since  the  matrix  Z  is  not  unique,  neither  is  0(^2)*  however, 

choose  Z  =  0,  which  certainly  satisfies  (146) ,  leaving 

B  =  D(«2)  I  =  -  «2  D(?)  Sj  (  I  ,  (147) 

which  is  the  same  result  derived  by  Golub  and  Pereyra  with  the  modified 
projection  operator  and  the  second  term  in  (142)  disregarded. 

With  this  definition  for  the  derivative,  the  Gauss-Newton  step 
direction  becomes 

d  =  (  I>(B)  Sj  «  Z  )*  «2  I- 

where  we  have  partitioned  S  ais 

»  =  (®i  1 52]- 

6.7  Marquardt’s  Modification 

Recall  that  for  a  gradient  method  to  produce  acceptable  steps,  the 
matrix  R^  has  to  be  positive  definite.  Neither  Newton’s  method  nor  the 
Gauss  approximation  to  it  ensure  that  R^  would  be  positive  definite,  so 
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that  they  cannot  be  considered,  in  their  original  form,  acceptable 
gradient  methods .  Noting  an  observation  by  Uaxquairdt  [20] ,  ire  may  force 
the  matrix  in  both  cases  to  be  positive  definite,  thus  making  the 
methods  acceptable. 

For  some  positive  definite  matrix,  P,  any  matrix  A  can  be  made 
positive  definite  by  adding  XP,  provided  that  the  positive  scalar  X  is 
large  enough.  Thus  ve  can  ensure  that  an  iteration  produces  an 
acceptable  direction  by  letting 

»i  =  [  A,  .  X.  r.  ]-l  (149) 

irhere  A^  is  H^,  N^,  or  some  other  appropriate  matrix. 

Several  choices  are  available  for  the  matrix  P^^.  In  partricular, 
suppose  that  P^^  is  diagonal.  Ve  may  then  define  the  diagonal  matrix  6 
with  elements 


With  this  choice  of  6,  we  may  write  as 


(150) 


1.  =  f  A.  +  X.  G.^  G.  1~^. 
1  I  1  11  1  J 


(151) 


If  we  focus  specifically  on  the  Gauss  method,  then  the  equation  for 
the  step  direction  with  Uarquardt’s  modification  is 


f  B.^  B.  +  X.  G.^  G.  1  d  =  -  B.^  e..  (152) 

LiiiiiJ-  1-1 

This  can  be  solved  using  the  Cholesky  factorization  for  symmetric 

matrices,  or  we  can  note  that  these  are  simply  the  normal  equations  for 

the  linear  least-squares  problem 


d  +  X. 
-  1 


X. 


1/2 


1 


0 
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where  we  have  added  zero  to  the  right  hand  side.  The  i’th  step  direction 
cam  be  calculated  using  the  QR  factorization  for  the  linear  least~squaires 
problem 


B.  1 

e. 

1 

-1 

- —  — 

d  =  - 

— 

X.l/2  8. 

L  1  1  J 

OI 

whose  solution  is 


(153) 


(154) 


5.8  Step-Size  Determination 

Upon  determination  of  the  step  direction,  the  optimum,  or  near 
optiffliim,  step-size  is  determined  using  a  line  search  along  the  given 
direction.  Essentially  the  step-size  which  yields  the  minimum  residual 
sum  of  squares  is  found  by  increasing  the  step-size  (with  large 
increments)  until  the  steps  cease  to  cause  a  decrease  in  the  residual, 
then  small  decrements  are  taken  until  the  actual  optimum  value  is  found 
(when  the  steps  again  cease  to  yield  a  decrease  in  residual  sum  of 
squares) . 
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6.  RESULTS  AND  CONCLUSIONS 

Uonte  Carlo  simulations  were  run  to  test  the  effectiveness  of  the 
algoritihffl  outlined  in  this  report.  This  chapter  will  present  the 
transducer  model  used  in  the  computer  simulations,  the  results 
obtained,  and  conclusions  based  on  those  results. 

6.1  Transducer  Model 


The  transducer  model  used  was  a  two-pole  high-pass  filter  excited  by 
stepped  sinusoid  (Model  1) .  The  Laplace  transform  of  the  theoretical 
response  is 

“  ~2  ^  >  ‘'O  ~  ^^^0 

s  +Uq 

where 


=2  2 


The  peak  of  the  transfer  function  magnitude  IH(j(i;)  I  occurs  at 

*ml  =  *01 

The  Q  (quality  factor)  is  the  ratio  of  the  peak  response  frequency  to  the 
3  dB  bandwidth  and  is  approximately 

?!  :  l/2jj  - 

This  model  can  be  seen  as  either  the  acoustic  signal  from  a  projector 


modeled  as  a  2-pole  high-pass  or  as  the  electrical  signal  seen  at  the 
output  of  a  hydrophone  when  the  acoustic  signal  is  an  ideal  stepped 
sinusoid. 
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The  s-plane  pole  parameters  for  the  decaying  component  of  the 
transient  are 

and 

=  *01 

The  exact  time  function,  y(t),  is 

=  •^Q  cos(2rfQt)  +  Aq  sin(2TfQt) 

C  S 

+  e  cos(2xf.t)  +  A-  e  sin(2Tf-t) 

■‘■C  ^  ^ 

i?here 


^  _ 

®0  ■  *  [(Vfgj)2  .  i]2j  ■ 

°s  ■  {{2fjio/igj)2  *  [(yfgj)2  - 1]2}  ’ 

. _ _ 

*0  »  [(V*ci)^  * 


8.2  Simulation  Results 


All  results  were  calculated  based  on  100  Monte  Carlo  trials.  The 
signal-to-noise  ratio  used  was  defined  in  terms  of  the  steady  state 
amplitude  2is  follows: 
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SNR(dB)  =  10  log 


Tables  1,  2,  and  3  give  a  numerical  representation  of  the  results  for 
Q’s  of  4,  8,  and  12,  respectivelx.  In  each  table,  the  bias,  standard 
deviation,  root  mean  square  error,  and  Cramer-Rao  bound  are  given  for  the 
steady-state  amplitude  (Aq)  and  the  transient  damping  factor  (a^)  and 
frequency  (f^)  at  each  of  the  signal- to-noise  ratios  simulated. 

Figures  3-5  give  a  graphical  compairison  of  the  results  for  this 
computer  implementation  of  the  variable  projection  nonlinear  least- 
squares  method  to  those  of  the  principal  component  linear  prediction 
method  described  in  Chapter  2.  For  each  parameter  (Aq,  a^,  and  f ^) ,  a 
plot  of  normalized  mean  square  error  vs.  signal-to-noise  ratio  (MSB’s  are 
normalized  to  the  C-R  bound)  for  each  Q  (4,  8,  and  12). 
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Table  1:  Normalized  Estimator  Results  for  Parameter  A. 


Bias 


Standard 

Deviation 


RMS  Error 


C-R  Bound 


Q  SNR 


4 

25 

4 

27 

4 

30 

4 

33 

4 

37 

4 

40 

4 

SO 

4 

60 

8 

25 

8 

27 

8 

30 

8 

33 

8 

37 

8 

40 

8 

50 

8 

60 

12 

25 

12 

27 

12 

30 

12 

33 

12 

37 

12 

40 

12 

50 

12 

60 

3,1397E-02 

3.84g4E-03 

3.3125E-03 

2.4347E-03 

1.7988E-03 

1.0275E-03 

2.6404E-04 

8.3761E-05 

-9.3672E-02 

-1.1479E-03 

1.5533E-02 

1.068SE-02 

8.8030E-03 

S.2801E-03 

1.7833E-03 

4.8134E-04 

-3,02S6E-01 

1.5185E-01 

7.80S7E-02 

3.8631E-02 

1.8663E-02 

1.4054E-02 

4.0782E-03 

1.2375E-03 


7.8014E-02 

5.5601E-02 

3.8165E-02 

2.6829E-02 

1.7345E-02 

1.1992E-02 

3.7056E-03 

1.1663E-03 

2.6597E-01 

2.2148E-01 

1.1790E-01 

7.8684E-02 

4.6857B-02 

3.2613E-02 

9.7773E-03 

3.2679E-03 

4.0855E-01 

4.0974E-01 

6.0740E-01 

2.0769E-01 

1.0343E-01 

6.9019E-02 

2.1160E-02 

8.5496B-03 


I  MSE. 


7.7686E-02 

S.5456E-02 

3.8118E-02 

2.6805E-02 

1.7352E-02 

1.1976E-02 

3.6965e-03 

1.1634E-03 

2.8072E-01 

2.2038E-01 

1.1834E-01 

7.9015E-02 

4.7116E-02 

3.2877E-02 

9.8904E-03 

3.2869E-03 

5.0674E-01 

4.3505E-01 

6.0938E-01 

2.1023E-01 

1.0459E-01 

7.0096E-02 

2.1445E-02 

6.6333E-03 


S.7052E-02 

4.5318E-02 

3.2083E-02 

2.2713E-02 

1.4331E-02 

1.0145E-02 

3.2083E-03 

1.0145E-03 

1.7476E-01 

1.3882E-01 

9.8276E-02 

6.9575E-02 

4.3899E-02 

3.1078E-02 

9.8276E-03 

3.1078E-03 

3.7311E-01 

2.9637E-01 

2.0981E-01 

1.4854E-01 

9.3721E-02 

6.6349E-02 

2.0981E-02 

6.6349E-03 
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Table  2:  Estimator  Results  for  Parameter 


Standard 

1 

Q 

SNR 

Bias 

Deviation 

RMS  Error 

C-R  Bound 

1  MSE 

‘^CR 

“l  ■  “l 

“l 

“l 

4 

25 

-1.0488E-02 

2.9812E-01 

'2.9681E-02 

1 . 1262E-01 

4 

27 

1.2867E-02 

9.4643E-02 

9.5044E-02 

8.9461E-02 

4 

30 

1.0796E-02 

6.5190E-02 

6.5755E-02 

6.3334E-02 

4 

33 

7.9022E-03 

4.7570E-02 

4.7987E-02 

4.4836E-02 

4 

37 

5.4118E-03 

3.1725E-02 

3.2026E-02 

2.8290E-02 

4 

40 

3.3593E-03 

2.1771E-02 

2.1921E-02 

2.0028E-02 

4 

50 

9.3497E-04 

6.9761E-03 

7.0038E-03 

6.3334E-03 

4 

60 

2.9473E-04 

2.2050E-03 

2.2246E-03 

2.0028E-03 

8 

25 

-2.2919E-01 

5.3391E-01 

5.7857E-01 

1.2295E-01 

8 

27 

-4.5185E-02 

2.5344E-01 

2.5619E-01 

9.7661E-02 

8 

30 

9.5538E-03 

6.7355E-02 

6.7695E-02 

6.9138E-02 

8 

33 

7.4555E-03 

4.8321E-02 

4.8654E-02 

4.8946E-02 

8 

37 

5.0330E-03 

3.02S6E-02 

3.0522E-02 

3.0883E-02 

8 

40 

4.0504E-03 

2.1383E-02 

2.1658E-02 

2.1863B-02 

8 

50 

1.4110E-03 

6.5945E-03 

6.7114E-03 

6.9138E-03 

8 

60 

3.8313E-04 

2.2454E-03 

2.2667E-03 

2.1863E-03 

12 

25 

-5,6826E-01 

7.4937E-01 

9.3747E-01 

1.4781E-01 

12 

27 

-3,3185E-01 

6.0141E-01 

6.8425E-01 

1.1741E-01 

12 

30 

-2.7766E-02 

2.2919E-01 

2.2972E-01 

8.3121E-02 

12 

33 

9.7796E-03 

5.7614E-02 

5.8153E-02 

5.8845E-02 

12 

37 

6.5181E-03 

3.5428E-02 

3.5848E-02 

3.7129E-02 

12 

40 

5.2898E-03 

2.4906e-02 

2.5339e-02 

2.6285E-02 

12 

50 

1.6507e-03 

8.1051e-03 

8.2316e-03 

8.3121E-03 

12 

60 

5.1621e-04 

2.5330e-03 

2.5727e-03 

2.6285E-03 
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Table  3:  Estimator  Results  for  Parameter  f 


Q 

SNR 

4 

25 

4 

27 

4 

30 

4 

33 

4 

37 

4 

40 

4 

SO 

4 

60 

8 

25 

8 

27 

8 

30 

8 

33 

8 

37 

8 

40 

8 

SO 

8 

60 

12 

25 

12 

27 

12 

30 

12 

33 

12 

37 

12 

40 

12 

50 

12 

60 

Bias 


-1.0416E-02 

-2.6716E-03 

-1.37S9E-03 

-9.6422E-04 

-6.0650E-04 

-4.2107E-04 

-1.0745E-04 

-3.1262E-05 

-9.8704E-02 

-2.9S63E-02 

-2.2884E-03 

-1.4406E-03 

-7.6443E-04 

-6,2716E-04 

-1.5717E-04 

-4.1869E-05 

-2.3453E-01 

-1.3253E-01 

-2.4989E-02 

-1.7707E-03 

-9.7337E-04 

-7.3483E-04 

-2,3559E-04 

-7.4384E-05 


Standard 

Deviation 


8.0899E-02 

1.7504E-02 

1.2783E-02 

9.0174E-03 

5.3263E-03 

3.7593E-03 

1.2411E-03 

3.8806E-04 

1.982SE-01 

1.1967E-01 

1.1726E-02 

8.4651E-03 

5.2576E-03 

3.6980E-03 

1.2063E-03 

3.8314E-04 

2.6917E-01 

2.2330E-01 

1.1681E-01 

9.5334E-03 

6.1725E-03 

4.3372E-03 

1.3686E-03 

4.1825E-04 


RMS  Error 
1  MSE. 

1 

8.1165E-02 

1.7620E-02 

1.2793E-02 

9.0239E-03 

S.3342E-03 

3.7641E-03 

1.2396E-03 

3.8738E-04 

2.2057E-01 

1.5051E-01 

1.1890E-02 

8.5450E-03 

5.2868E-03 

3.7325E-03 

1.2105E-03 

3.3869E-04 

3.5599E-01 

2.5870E-01 

1.1888E-01 

9.6495E-03 

6.2183E-03 

4.3776E-03 

1.3820E-03 

4.2275E-04 


C-R  Bound 
‘^CR. 

2.1509E-02 

1.7085E-02 

1.2095E-02 

8.5629E-03 

5.4028E-03 

3.8249E-03 

1.2095E-03 

3.8249E-04 

2.1502E-02 

1.7080E-02 

1.2091E-02 

8.5600E-03 

5.4010B-03 

3.8236E-03 

1.2091E-03 

3.8236E-04 

2.5071E-02 

1.9920E-02 

1.4102E-02 

9.9836E-03 

6.2992E-03 

4.45g5E-03 

1.4102E-03 

4.4595R-04 


Figure  3.  MSE(alphai)  rel  Cramer-Rao  Lower  Bound 

Excitation  at  Resonance  of  2-pole  high-pass 
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Figure  5.  MSE(Ao)  rel  Cramer-Rao  Lower  Bound 

Excitation  at  Resonance  of  2-pole  high-pass 
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6.3  Conclusions 

The  results  given  demonstrate  that  this  computer  implementation  of 
the  variable  projection  nonlineair  least-squares  algorithm  exhibits 
maximum  likelihood  performance  above  a  threshold  signal-to-noise  ratio 
which  depends  upon  the  Q  of  the  model.  Below  the  threshold  SNR,  this 
implementation  departs  from  maximum  likelihood  performance.  Judging  from 
the  behavior  of  the  results  and  observations  of  the  convergence  behavior 
below  the  threshold  SNR,  it  appears  that  the  current  implementation  of 
the  variable  projection  method  is  unable  to  converge  when  the  initial 
estimate  from  the  principal  component  linear  prediction  method  is  far 
removed  from  the  true  minimum  of  the  variable  projection  functional  (see 
Appendix  B) .  Future  work  will  examine  alternative  step-direction  and 
step-size  search  methods  which  should  improve  the  performance  at  these 
low  signal-to-noise  ratios. 
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APPENDICES 


A.  Factorisation  bx  Successire  Householder  Transformation 


The  Householder  reflector  is  defined  [9]  as 
U  =  I  -  u 

such  that 

U  X  =  -  <r 

•where 

a  =  sgn(x^)  I |xl I, 

=  *1  +  o, 

Ui  =  •  •  • »“ 

and 


P  =  a  Uj^. 

When  triangularizing  the  matrix  F  using  successive  Householder 
reflectors,  each  column  of  F  is  transformed  by  reflectors  formed  from 
each  of  the  preceeding  columns;  i.e.  denoting  as  the  Householder 
reflector  which  zeros  the  elements  below  the  i’th  diagonal,  then  we 
define 


h-i  I 


I.  = 
1 


such  that  H^  zeros  the  elements  below  the  first  diagonal  and  transforms 
columns  2  through  M,  H2  zeros  the  elements  below  the  second  diagonal 
and  transforms  columns  3  through  M,  and  so  on.  Thus  the  i’th  reflector 
effects  only  the  rightmost  M-i+1  columns  and  the  lower  N-i+1  rows. 

After  the  reflector  has  been  constructed  for  all  M  columns,  the 
orthogonal  matrix  is  defined  as 
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Furthermore,  since  only  the  vector  and  the  scalar  are 
necessary  for  forming  at  each  stage,  all  information  concerning  the 
construction  of  %  can  be  saved  by  storing  the  last  n-i  elements  of  u^^ 
below  below  the  i’th  diagonal  element  of  F  and  storing  the  pre¬ 
transformation  value  of  the  diagonal  element  in  an  auxiliary  vector 
(note  that  the  post- transformat ion  value  of  the  diagonal  is  so 

that  is  indirectly  available) . 


73 


B.  Variable  Projection  Functional  Contour  and  Surface  Plots 

This  appendix  contains  contour  and  surface  plots  for  the  variable 
projection  functional  of  the  tvo-pole  high-pass  filter  transducer  model 
(see  Section  6.1).  For  a  given  parameter  set,  the  noiseles  observation 
vector,  and  the  known  excitation  frequency  will  be  fixed  and  this 

functional  can  be  written  solely  in  terms  of  the  parameters  a  and  f  as 

RSSCa.f)  =  j|  I-PF(a.f)  I  1|  ' 

For  each  of  these  signal  parameters  sets 

{N=16,  T=0.25,  Q=4,  fml=1.0,  f 0=1.0} 

{N=18,  T=0.25,  Q=8,  fml=1.0,  f0=1.0} 

{N=18,  T=0.25,  Q=12,  fml=1.0,  f 0=1.0} 

{N=16,  T=0.25,  Q=4,  fml=1.0,  f0=0.5} 

contour  plots  are  given  with  a  in  Nepers  given  along  the  horizontal  axis 
and  f  in  Hertz  given  along  the  vertical  axis.  Following  each  contour 
plot,  the  following  views  of  each  error  siirface  (RSS  vs.  a  and  f)  are 
provided: 

1.  Surface  viewed  from  large  f  and  small  a 

2.  Surface  viewed  from  small  f  and  large  a 

3.  Surface  viewed  from  the  a=0  plane  (front  side) 

4.  Surface  viewed  from  the  a=1.5  plane  (back  side). 

These  plots  exhibit  the  flat  nature  of  the  functional’s  surface  as 
the  frequency  estimate  (f)  becomes  far  removed  from  the  true  minimum  of 
the  functional.  It  is  in  these  areas  particularly  that  the  problems 
described  in  Section  6.3  occur. 
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The  non-resonance  excitation  case  (f0=0.5)  was  included  above  for 
comparison.  This  contour  suggests  a  smoother  surface  than  the 
excitation-at-resonance  case . 


Model  1  RSS  Contour  N=16  0=4  fO=l  al=0.7304 


Model_l  RSS  Surface  N=16  Q=4  fO=l  al=0.7304  fl=0.979  (-38.8,30) 
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Model  1  RSS  Surface  N=16  0=4  fO=l  al=0.7304  fl=0.979  (142.2,30) 
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Model  1  RSS  Surface  N=16  0=4  fO=l  al=0.7304  fl=0.979  (270,0) 
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RSS  Contour  N=16  0=8  fO=l  al=0. 06155 


Model  1  RSS  Surface  N=16  0=8  fO=l  al=0. 06155  fl=0.9962  (-38.8,30) 
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Model  1  RSS  Surface  N=16  0=8  fO=l  al=0. 06165 


Model  1  RSS  Surface  N=16  0=8  fO=l  al=0. 06155  fl=0.9962  (270.0) 


Model  1  RSS  Surface  N»16  0=12  fO=l  al=0. 04139  fl=0.9983  (142.2,30) 
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Model_l  RSS  Surface  N=»16  Q=12  fO=l  al«0. 04139  fl=0.9983  (270,0) 


.7304 


Model  1  RSS  Surface  N=16  Q=4  f0=0.5  al=0.7304  fl==0.979 


